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ABSTRACT 


~~'1  A  new  decision-theoretic  approach  to  Nonlinear  Programming  Probl 
with  stochastic  constraints  is  introduced.  The  Stochastic  Program  (SP) 
is  replaced  by  a  Deterministic  Program  (DP)  in  which  a  term  is  added  to 
the  objective  function  to  penalize  solutions  which  are  not  ^feasible  in 

■*  i  j..  >  '  *  - 

the  mearr"".  The  s  pec  i  a  ^feature  of  our'  approach  is  the  choice  of  the 

■S-  *  *** 

penalty  function  p£,  which  is  given  in  terms  of  the  relative  entropy 
functional,  and  is  accordingly  called  entropic  penalty.  It  is  shown 
that  P^  has  properties  which  make  it  suitable  to  treat  stochastic 
programs.  Some  of  these  properties  are  derived  via  a  dual  representation 

‘ » *  *  c 

of  the  entropic-penalty  which  also  enable  one  to  compute  P^  more  easily, 
in  particular  if  the  constraints  in  (SP)  are  stochastically  independent. 
The  dual  representation  is  also  used  to  express  the  Deterministic  Problem 
(DP)  as  a  saddle  ^functi on  problem.  For  problems  in  which  the  randomness 

i  ■ 

occurs  in  the  rhs  of  the  constraints,  it  is  shown  that  the  dual  problem 
of  (DP)  is  equivalent  to  Expected  Utility  Maximization  of  the  classical 
lagrangian  dual  function  of  (SP),  with  the  utility  being  of  the  constant- 
risk-aversion  type.  Finally,  mean-variance  approximations  of  P^  and 
the  induced  Approximate  Deterministic  Program  are  considered. 

A 


INTRODUCTION 


Mathematical  Programming  problems  with  stochastic  constraints, 

(SP)  inf{gQ(x):  g(x,b)  >a), 

dependening  on  a  random  vector  b,  are  the  subject  of  our  investigation. 

A  new  decision-theoretic  approach  is  suggested  in  the  paper  as  a  possible 
way  to  treat  these  stochastic  programs.  The  approach  is  based  on  imitat¬ 
ing  the  penalty  function  method  of  deterministic  Nonlinear  Programming. 

In  this  method  the  constrained  problem  is  replaced  by  an  unconstrained 
one,  in  which  the  new  objective  function  has  the  property  of  "penalizing" 
(increasing  the  minimand)  violations  of  the  constraints.  With  an 
appropriate  interpretation  of  "violation  of  constraints"  in  the  stochastic 
case,  and  with  an  appropriate  choice  of  the  penalty  function,  to  reflect 
the  stochastic  environment  of  the  problem,  we  derive  a  deterministic 
problem  (DP)  replacing  (SP): 

(DP)  inf{gQ(x)  +  pP£(x)} 

where  p  >  0  is  a  penalty  parameter,  and  P^  is  our  penalty  function. 
This  function  is  given  in  terms  of  the  relative  entropy  functional, 
widely  used  in  Statistical  Information  Theory,  [5],  [6]'. 

i/ 

If  fb  is  the  generalized  density  of  the  random  vector  b  6  R  , 
and  D^  is  the  set  of  all  generalized  densities  f  of  random  vectors 
z  €  R  (all  absolutely  continuous  with  respect  to  a  common  nonnegative 
measure  dt),  then  the  relative  entropy  I(f,fb)  betv/een  the  random 
vectors  z  and  b  is 

Hf,fb)  =  Jf ( t) log  |^ydt  . 
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The  penalty  function  is  given  by 


PF(x)  =  inf  {I(f,fh):  g(x,t)f(t)dt  >  a> 

f6°k  J 

and  is  called  accordingly  entropic  penalty.  The  motivation  for  choosing 
PE  and  the  induced  deterministic  program  (DP)  is  discussed  in  C’nap.l. 
Properties  of  the  entropic  penalty,  studied  in  Chap.  2,  help  further  to 
demonstrate  the  appropriateness  of  using  (DP).  It  is  shown  that  P£(x) 
penalizes  "violation  of  constraint  in  the  mean",  i.e.  P£(x)  *  0  if 
Eg(x,b)  >  a  and  P^(x)  >  0  otherwise.  In  this  sence  (DP)  is  a 
"relaxation"  of  the  deterministic  program 

inf{gQ(x):  Eg(x,b)  >  a} 

which  can  be  recovered  from  (DP)  by  letting  p  be  large  enough.  The 
latter  program  includes  in  particular  the  familiar  chance  constraints 
problem  [2],  Another  desirable  property  of  PF  is  that  surely  infeasible 
solutions,  i.e.  those  x's  that  are  infeasible  for  any  realization  of 
the  random  vector  b,  are  excluded  from  (DP),  since  for  those  (and  only 
those)  PE(x)  =  «  .  It  is  also  shown  that  a  greater  "violation  in  the 
mean"  of  a  constraint,  results  in  a  greater  penalty. 


Some  of  the  above  mentioned  properties  of  the  entropic  penalty  are 
derived  from  its  definition,  while  other  rely  heavily  on  a  dual  representa¬ 
tion  of  P^,  which  also  provides  an  easy  way  to  compute  it: 

Pp(x)  =  sup  {y^a  -  log  Eey  9(x»b)}  _ 
y*0 


The  duality  theory  needed  to  obtain  the  dual  expression  is  developed  in 
Chap.  3.  This  representation  can  be  further  simplified  ,  and  for 


independent  constraints  (i.e.  g . (x,b)  =  g.(x,b.)  and  the  b. 1 s  are 
independent  random  variables)  it  has  an  explicit  representation  in  term 


and  its  derivative.  The 


of  the  function  ^(t)  =  log  Ee 
dual  representation  also  enable  us  to  express  the  deterministic  problem 
(DP)  as  a  saddle-value  problem,  and  finally  to  demonstrate  that  (DP) 
is  equivalent  to  the  problem 

inf  sup  EUUb(x,y)) 
x  y*0  D 

where  U  is  the  Constant  Risk  Aversion  (CRA)  utility  function 

U(t)  ■  _e_n/p)t  and  ^(x.y)  is  the  classical  Lagrangian  corresponding 

to  (SP): 

*5(x,y)  =  gQ(x)  -  yT(g(x,b)  -  a). 

The  important  special  case  of  (SP): 

(SP-RHS)  inf{gQ(x):  g(x)  <b> 

is  thoroughly  discussed  in  Chap.  4.  The  outstanding  result  which  is 
obtained  for  such  convex  stochastic  programs  is  the  nature  of  the  dual 
problem  to  the  primal  entropic-penalty  program  (DP);  The  dual  decision- 
maker  is  an  expected  utility  maximizer,  possessing  a  CRA  type  utility 
function  U  with  an  Arrow-Pratt  risk  indicator  (-U'/UV  equal  to  the 
reciprocal  of  the  penalty  parameter.  While  in  the  deterministic  case 
the  dual  problem  is 

max  (inf  th(x,y)), 
y>0  x  0 

in  the  stochastic  case  our  approach  leads  to  the  dual  problem 

max  EU(inf  fch(x,y)). 
y>0  x  D 

The  Expected  Utility  Maximization  is  one  of  the  fundamental 
approaches  of  Economics  and  Decision  Theory  under  Uncertainty.  The 
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fact  that  (DP)  generates  such  a  sound  dual  is  perhaps  the  most  convinc¬ 
ing  argument  in  favor  of  the  entropic  penalty  approach. 

in  Chap.  5  we  obtain  simple  approximations  of  P^x),  in  terms  of 
the  mean  vector  and  the  variance-covariance  matrix  of  the  random  vector 
g(x,b).  The  approximated  er.tropic  penalty  P£(x)  is  then  given  as  the 
optimal  value  of  a  simple  convex  quadratic  program  with  only  nonnegativity 
constraints,  or,  for  independent  constraints,  by  an  explicit  formula 
involving  m^x)  =  Eg^x.b.)  and  o*(x)  -  the  variance  of  g^x.b.): 


PE(x)  =  —  fmax(0,a.  -  m.(x))]2 

a|(x)  L  J 

For  stochastic  RHS  Problems  the  approximations  reduce  to: 

PF(x)  =  sup{yT(u  -  g(x) )  -  l  yTVy} 

41  ysO  & 


where  y  =  Eb  and  V  is  the  variance-covariance  matrix  of  b.  The 
approximation  is  exact  if  b  is  jointly  Normal:  b~N(y,V). 

Using  these  approximation  in  (DP)  one  obtains  an  Approximate 
Deterministic  Problem  (ADP): 


(ADP)  inf{gQ(x)  +  pP£(x)>. 

As  an  illustration,  for  a  stochastic  RHS  Problem  with  independent  b.-'s 
(having  y.  and  variance  aj)  the  Approximate  Deterministic  Problem  is: 

(ADP)  infjgQ(x)  +  |  2  ~  [max(0,g.(x)  -  y.)]2}  . 

Oj  L  J  1 

The  latter  program  is  similar  to  the  one  used  in  the  classical 
penalty  Function  method  for  the  constrained  (deterministic)  problem 


inf<go(x):  g(x)  <  yl 


-  5  - 


2 

except  for  the  presence  of  the  coefficients  l/o^.  The  role  of  these, 
in  the  stochastic  case,  is  to  attribute  smaller  significance  to  "more 
ambiguous"  constraints,  i.e.  those  for  which  the  rhs  b^  has  larger 
variance. 

Problem  (ADP)  just  mentioned,  and  a  score  of  other  problems 
occuring  in  the  paper,  give  rise  to  interesting  problems  in  Nonsmooth 
Optimization  that  may  entail  the  use  of  numerical  methods  developed 
for  such  purposes,  see  e.g.  [1],  and  [7]. 

As  a  general  introduction  to  existing  methods  in  Stochastic 
Programming,  the  reader  is  referred  to  the  excellent  review  articles 
by  Dempster  (Part  I  in  [3]}  and  Kail  [4]. 
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CHAPTER  1  -  THE  ENTROPIC  PENALTY  APPROACH 

Consider  the  nonlinear  programming  problem 
(SP)  inf{gQ(x):  g(x,b)  >  a)  , 

where  x  €  Rn  is  the  decision  vector;  b  e  Rk  and  a  €  R™  are  fixed 

n  k  (n 

parameters,  and  g  is  the  vector-valued  constraint  function  g:  RnxR+R. 
Let  the  feasible  set  be  denoted  by 

Sb  =  {x:  g(x,b)  >  a)  . 

Frequently  (P)  is  converted  to  an  unconstrained  problem  by  adjoining 
to  the  objective  function  gQ(x)  a  penalty  function  P(x)  and  thus 
replacing  (P)  with 

inf{gQ(x)  +  pP(x)}  (1) 

where  p  >  0  is  a  penalty  parameter.  The  function  P(x)  is  generally  a  dist 
ance  function  measuring  how  far  is  x  from  the  feasible  set,  i.e. 

P(x)  =  dist(x,$b), 

but  it  can  also  be  given  in  terms  cf  the  distance  between  b  and  the 
set 

S"1  =  (z:  g(x,z)  >  a}  , 

i  .e. 

P(x)  =  distib,.^1}  *  inf{dist(b,z) :  z  £  S'1}  . 

Problem  (1)  becomes  then 

iiiflg  (x)  1  o  inf{dist(b.z):  g(x,z)  >  a}} 
x  1  u  /.  * 


(2) 
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The  original  problem  (P)  is  in  fact  a  special  case  of  problem  (1)  with 


P(x)  = 


0 


if  g(x,b)  >  a 
otherwi se 


or  with  finite-valued  P(*)  but  with  penalty  parameter  p  very  large. 

In  other  cases  (1)  (and  hence  (2))  can  be  viewed  as  a  relaxation  of  (P). 


Assume  now  (and  henceforth  in  this  paper)  that  the  parameter  vector 
b  is  stochastic,  with  distribution  function  Fb(*),  absolutely  continuous 
v.r.t.  a  nonnegative  measure  dt,  and  possessing  a  generalized  density 
(Radon  Nikodym  derivative)  fb(*)-  Let  B  e  R  be  the  support  of  b. 

Looking  back  at  problem  (2),  one  should  naturally 

think  now  of  z  as  a  random  vector.  Thus  it  remains 

to  interpret  two  things:  (a)  the  meaning  of  a  "distance  between  two 
random  variables"  and  (b)  the  meaning  of  "g(x,z)  >  a"  when  z  is  random. 
As  for  point  (a)  there  is  a  classical  answer,  which  is  the  fundamental 
concept  in  Statistical  Information  Theory  (see  e.g.  the  book  by 
Kullback  [5]) 


dist(b,z) 


=  I(fz,fb)  -  j  fz(t)  log 


fz(t) 

fjtj 


* 

dt 


The  integral  I(f2,fb)  is  the  so  called  relative  entropy  or  divergence. 
It  legitimacy  as  a  "distance  function"  comes  (among  other  things)  from 
its  well-known  property 


Proposition  1.  I(fz,fb)  >0  and  is  equal  to  zero  if  and  only  if 
fz  =  fb 


This  is  a  short  notation  for 


\\mmm  . V109  fbTt“7 


•  ’V 


•  ’  V 


dt,,, 


•  dt. 
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As  for  the  second  point  (b),  we  adopt  the  interpretation  that 
"g(x,z)  >a  holds  in  the  mean",  i.e. 

Ezg(x,z)  >  a. 

The  result  is  a  penalty  function  P^(*)>  called  entropic  penalty,  which 
is  given  by 


Pr(x)*inf  (f  f(t)log  dt:  f  g,(;<,t)f(t)dt  >  a. ,  i  =  1 . ml  (3) 

L  f€Dk^B  VtJ  JB  1  J 

where  Dk  is  the  set  of  all  generalized  densities  of  random  vectors 
z  £  R  ,  which  are  absolutely  continuous  w.r.t.  the  measure  dt. 

In  terms  of  the  entropic  penalty,  we  introduce  the  Determinstic 
Primal  (DP)  problem  as  a  surrogate  for  the  Stochastic  Primal  (SP) 
problem: 

(DP)  inf{g  (x)  +  pPE(x) >  . 

x 

Let  us  note  then  if  x  is  such  that  ffa  itself  satisfies  the 
constraint  in  (3),  i.e. 

Ebgi(x,b)  a.. ,  i  =  1 , . . . ,m,  (4) 

then  the  optimal  density  is  f^  itself,  and  by  Proposition  1  it  follows 
that  P(:<)  =  0.  At  the  same  time,  if  x  is  such  that  (4)  is  violated 
tiion  P(x)  >  0.  Therefore,  (DP)  is  a  relaxation  of  the  follov/ing,  more 
naive,  deterministic  replacement  ci:  (SP),  namely 

inf{go(x):  Ebg(x,b)  >■  a}  .  (5) 


As  a  concrete  example,  let  g(x,b)  be  chosen  as 


r  1  if  g(x)  <  b 

g(x,b)  =  (6) 

^  0  if  g(x)  £  b 

and  let  a  =  1-a  (0<a<l).  Problem  (5)  becomes  the  well -known 
Chance  Constrained  program  (see  [2]): 

(CC)  inf{f(x) :  Pr(g(x)  <  b)  S*  1-a)  . 

The  corresponding  Deterministic  Primal,  which  in  this  case  is 
denoted  (CCDP), 

cp 

(CCDP)  inf{gn(x)  +  p-inf  (I(f,fh):  f  f(t)dt  >!-«*}}  , 
x  f€Dk  b  gtx) 

penalizes  violations  of  the  chance  constraints.  (CC)  can  be  recovered 
from  (CCDP)  by  choosing  p  sufficiently  large. 
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CHAPTER  2  -  PROPERTIES  OF  THE  ENTROP I C- PENALTY 

In  this  Chapter  some  important  properties  of  P^  are  derived. 
Additional  properties  will  be  discussed  in  Chap.  3  as  well.  These 
properties  demonstrate  the  appropriateness  of  using  the  entropic  penalty 
for  solving  Stochastic  Programming  problem. 


Proposition  2: 


r  =  o 


PE(x) 


if  Ebg.(x,b)  >  ai 


Vi 


if  9,-(x)  =  sup  g.(x,b)  <  a.  for  some  i 
1  b£B  1  1 


l  positive  and  finite  -  otherwise 


Proof:  By  Proposition  1,  P£( x)  >0  with  equality  if  and  only  if 

the  optimal  f  is  equal  to  f^  (a.e),  this  is  possible  if  and  only  if 
fb  is  feasible  i.e.  Ebg..(x,b)  >a.,  Vi.  It  remains  to  show  that 
P^(x)  =  if  and  only  if  g^(x)  <  ai  for  some  i.  The  latter  means 

that  the  constraints  in  (3)  are  infeasible,  implying  Pg(x)  *  «.  That 
the  opposite  is  also  true  (i.e.,  Pg-( x)  =  »  implies  (3)  is  infeasible) 
follows  from  Theorem  1(b)  in  Chap.  3. 

□ 

The  proposition  demonstrates  that  P^  is  a  penal ty  function 
for  the  constraints  Eg(x,b)  >a  and  a  barrier  function  for  the 
constraints  g(x)  >  a.  The  Deterministic  Primal  problem  (DP)  can  be 
rewritten  as 

(DP)  infig  (x)  +  pPr-(x):  g(x)  >  a}  . 
x 

Note  that  g(x)  ^  a  means  that  x  is  not  feasible  for  the  original 
(SP)  problem  for  any  realization  of  b,  and  exactly  these  surety 
infeasible  solutions  arc  ruled  out  by  (DP) 


The  next  two  results  concern  independent  constraints.  We  say  that 
g(x,b)  >  a  are  independent  constraints  if  the  components  (b^)  of  b 
are  independent  random  variables,  and  if,  for  each  i,  the  i-th  cons¬ 
traint  depends  only  on  b..,  i.e.,  k  =  m  and 

g^x.b)  = 

We  make  it  clear  that  in  this  case,  the  set  in  (3)  is  the  set  of 

If 

all  generalized  densities  of  random  vectors  z  €  R  ,  with  independent 

m 

components,  so  f(tj ,. . .  ,tffl)  =  n  f ^ ( t^ ). 

Proposition  3:  For  independent  constraints,  is  given  by 


m  . 

Pp(x)  *  2  Pr(x)  where 

w  f(t.} 

pe(x)  •/S.flW1"  f^rrr  >at} 

Proof :  The  result  follows  from  the  well-known  additivity  property 

of  the  relative  entropy  for  independent  random  variable  ([5]  Th.  2.1). 


The  proposition  expresses  the  useful  fact  that,  whenever  the  cons¬ 
traints  are  independent,  the  penalty  for  the  system  of  constraints 
equals  to  the  sum  of  penalties  for  the  individual  constraints. 

We  say  that  x  is  less  feasible  than  x  for  the  i-th  constraint 
(in  the  mean)  if 


EbSfU’.b)  -  at  <  Ebgi(x2,b)  -  ar 
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Proposition  4:  Lot  the  constraints  of  (SP-RHS)  be  independent.  If  x1  is 

2 

less  feasible  than  x  for  the  i-th  constraints,  then 

>  p’(x2)  .  □ 

The  next  results  concerns  Stochastic  RHS  problems: 

(SP-RHS)  inf{gQ(x):  g(x)  <  b>  . 

This  is  a  special  case  of  (SP)  with 

g(x,b)  =  b  -  g(x) ,  a  =  0  (8) 

Proposition  5:  For  a  Stochastic  RHS  problem 

Pr(x)  =  inf  {I(f,f  ):  [tf(t)dt  >  g(x)}  . 

f€Dk  6  t 

If  (SP-RHS)  is  a  convex  program,  then  P£(x)  is  a  convex  function. 

Proof:  The  equation  (9)  follows  from  a  simple  substitution  of  (8)  in 

(3).  The  convexity  result  will  be  proved  in  Chap.  4  via  a  dual 

expression  for  P^,  from  which  the  conclusion  of  Proposition  4  follows  too.  □ 

A  convexity  result  holds  also  for  the  chance  constrained  problem  (CC). 

The  proof  is  also  postponed  to  Chap.  3  (see  Remark  1,  following 
Example  1). 

Proposition  6:  If  (CC)  has  independent  and  concave  constraints  (i.e. 

for  each  i  and  each  b.. ,  Pr(g^(x)  <b^)  is  a  concave  function  of  x) 
then  PE(x)  is  a  convex  function. 


i 
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CHAPTER  3  -  A  DUAL  REPRESENTATION  OF  P£  AND  A  SADDLE  FUNCTION 
REPRESENTATION  OF  (DP) 

The  value  of  the  entropic  penalty  function  P^  at  a  given  point  x, 
is  the  optimal  value  of  the  extremal  problem 

(E)  inf  1(f)  =  ff(t)  log  £4lr  dt 
f€Dk  J  Vt} 

subject  to  (10) 

Jgi (x,t)f (t)dt  >  ait  i  *  1 , . . . ,m  . 

We  will  write  this  shortly  as 
PE(x)  =  inf(E) . 

By  constructing  a  dual  problem  for  (E),  say  (H),  a  dual  representation 
of  PE  will  follow: 

PE(x)  ■  sup(H). 

To  construct  (H)  we  first  need  an  auxiliary  result. 

Lemma  1 :  Let  c(t)  be  a  given  positive  summable  function: 
jc(t)dt  =  C  <  •  . 


Then 

inf  Jf ( t)  log  dt  =  -  log  jc(t)dt.  (11) 

k 

Proof:  Use  the  identity 

jf (t)log  £[|J-dt»  Jf(t)  log  dt  -  log  C.  (12) 

Nov/,  since  c(t)/C  is  a  density,  it  follows  from  Proposition  1  that 


Si 
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the  first  term  in  the  rhs  of  (12)  is  minimized  by  f(t)  =  c(t)/C  and 
its  optimal  value  is  zero,  so  the  infimal  value  of  the  Ihs  of  (12) 
is  -log  C,  as  claimed. 

□ 

We  now  form  the  Lagragian  of  problem  (E),  L:  x  R  with 

values 

m  f  j 

L(f,y)  =  1(f)  -  2  y.  g. (x,t)f(t)dt  +  ay  . 

i=l  1  J  1 


The  dual  objective  function  is 


g(y)  =  sup  L(f,y) 
feDk 


or,  more  explicitly 

g(y)  *  inf  (1(f)  -  lyifgi(x,t)f(t)dt  +  aTy} 

f6Dk  ’i 

=  ^  {|’09(yt]«^i9-(xary)  f<t)dt}  * aT* 

t  f  sy.g.(x.t) 

=  a  y  -  log  f^(t)e  dt,  (by  Lemma  1). 


So,  the  dual  of  (E)  is 


r.yigi(x,t) 


(H)  sup  {a~y  -  log  [fh(t)e  1  ’  dt)  . 
y>0  *  0 


Theorem  1 :  [Duality  Theory  for  (E)-(H).] 

(a)  If  (E)  is  feasible  then  inf(E)  is  attained  and 

min(E)  -  sup(H). 

(b)  sup(H)  <  «  if  and  only  if  (E)  is  feasible 
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(c)  sup(H)  is  attained  if  there  exists  a  density  f  in  satisfying 
the  constraints  (10)  strictly,  in  which  case 
min(E)  =  max(H). 

Moreover,  if  f*€  solves  (E)  and  y*  >  0  solves  (H)  then 

,  v  -zy^U.t) 

f.(t)e  1  1 


f*(t)  = 


ffb(t> 


(a.e.) 


Proof:  We  set  problem  (E)  as  a  convex  problem,  in  an  appropriate  vector 
space,  with  finitely  many  linear  constraints,  as  follows.  Let  M(B)  be 
the  linear  space  of  real -valued  finite  regular  Borel  measure  (rBm)  on  B. 
Let  dt  be  a  nonnegative  rBm  on  B.  For  p  e  M(B),  which  is  absolutly 
continuous  w.r.t.  dt- we  denote  by  its  Radon-Nikodym  derivative. 
Whenever  p  €  S  (the  convex  subset  of  probability  measures)  we  call 
f(t)  =  a  (generalized)  density.  Let 


J(u)  - 


|  f(t)log  dt  if  p  is  an  abs.  cont.  probability 


measure,  and  f  = 


otherwise 


and  consider  the  linear  operator  A:  p(B)  -*•  Rn 


jg^x.tjdp 

|gm(x,t)dp 


Then,  problem  (E)  amounts  to 


1nf{J(p):  Ap  >  a)  . 
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Now,  (H)  is  just  the  Lagrangian  dual  of  (14)  (which  here  coincides  with 
the  Fenchel-Rockafellar  dual  [11])  and  most  of  the  results  in  the  theorem 
follow  from  standard  duality  relations  (e.g.  [10],  [11],  and  [8]). 

Thus,  the  fact  that  the  dual  (H)  has  only  nonnegativity  constraints  y  >  0 
(and  hence  satisfying  the  strongest  constraint  qualification)  implies 
lack  of  duality  gap  and  attainment  of  the  primal  infimum.  Part  (c)  is 

just  the  usual  dual  statement.  As  for  part  (b),  the  implication 

(E)feasible  =*■  sup(H)  <  °° 

follows  from  v/eak  duality.  Thus,  only  the  reverse  implication 

(E)infeasible  =»  sup(H)  =  ®  (15) 

is  exceptional  here  and  needs  special  care. 

The  feasible  set  of  (E)  is 

Ap  >  a  Ty  =  1  v  nonnegative  (16) 

where  A  is  the  linear  operator  (13),  and  T  is  the  linear  function 

u-^— *•  jdu  . 

Using  a  duality  theorem  for  linear  program  in  vector  spaces  (e.g.  [8], 

Theorem  3.13.3,  p.  68),  it  follows  that  the  infeasibility  of  (16)  is 

equivalent  to  the  feasibility  of 

A*y  +  T*v  <  0,  y'a  +  v'  >  0  y  6  Rj,  v  €  R  .  (17) 

Here  A*:  Rm  ->  C(B),  T*:  R  C(B)  are  the  adjoints  of  A  and  T 
respectively:  A*y  =  Ey.g^x.t);  T*v  =  v  (a  constant  function  in  C(B)  - 
the  linear  space  of  continuous  function  on  B  -  the  dual  space  of  M(B)). 
So,  (17)  implies  that 
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3y  >  0,  v  £  R  such  that 

Ey^U.t)  +  v  <  0,  y'a  +  v  >  0.  (18) 

Now,  using  the  identity 
0  =  v  -  log  ev 

it  is  easy  to  see  that  the  dual  program  (H)  is  equivalent  to 

r  i(x,t)+v 

sup  {y'a  +  v  -  log  f.(t)e  dt }  .  (19) 

y*0,v€R  J  0 

By  taking  y,  v  from  (18),  and  M  >  0  arbitrary  large,- it  is  seen  that 
the  sup  in  (19)  is  made  arbitrary  large  by  choosing  y  *  My,  v  =  Mv, 
i.e.  sup(H)  =  »  . 

□ 

From  Theorem  1  we  obtain  a  dual  representation  of  the  entropic 
penalty  function,  which  is  much  simpler  than  the  primal  expression 
given  by  (3): 

m 

.  2y.g.(x,t) 

PF(x)  =  sup  {yTa  -  log  fh(t)e1_1  dt>  .  (20) 

b  y*0  J  0 

This  representation  is  a  key  factor  in  deriving  important  facts  (some 
mentioned  already  in  Chap.  2)  about  P£  and  about  the  dual  problem  of 
(DP).  As  an  "appetizer"  we  obtain  the  explicit  expression  of  P^  for 
independent  chance  constraints. 

Example  1 :  Problem  (CC)  with  independent  constraints  is 
inf{gQ(x):  Pr(g1(x)  <  b^)  >  1-qj,  i  *  l,...,m) 

and  the  corresponding  (CCDP)  problem  is 
inf(g0(x)  +  pcPgfxl). 
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By  (20): 


p^(x)  =  lyO-cu)  -  log  Jfb(t)e  1  dt} 


(KyeR 

Recalling  from  (16)  that 

1  if  g^x)  <t 

0  otherwise 


(21) 


3.(x,t)  =  | 


we  get  from  (21),  in  term  of  the  cumulative  distribution  function 


Fi  of  b. , 


pj(x)  =  sup  {y(l-a.)  -  log[(l-F.(g.(x)))ey  +  F-(g-(x))]}  . 

c  y>0  11  11 

By  simple  calculus,  the  maximizing  y  is  yt  given  by 


(22) 


r ,  r(1-ai,Ft<s01 

I  ’-V-hH-w' 


if  F(gi(x))><xi 
if  Ftg^x))  <  a.- 


Substituting  yt  in  (22)  yields 


P£<x)  -  { 


o  if  Fi(gi(x))  i.e.  Pr(gi(x)  ^b^  >1-0^ 

1-a. 


(23) 


ai109  F^Cx))  +  (1'ei)l0gG~FV(g°-(x)))  1fFi(9i(x)^ 


co  1-a. 

Rena r k  1 :  The  function  h.(t)  =  a.  log  +  (1-a.)  leg  t-~  is 

convex  and  increasing  for  0  <  a.  <  t  <  1  and  h(a.)  =  0.  If  F^g^x)) 
is  convex  (i.e.  if  Pr(g.(x)  <b.)  is  concave  in  x)  then  h^F.fg^x)) 
is  convex  for  x  such  that  ^  <Fi(gi(x)).  This  proves  that  p’(x)  is 
convex  since  by  the  above  and  (23): 
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Pfc(x)  =  hi(m?.x(a^  ,F^(gi(x))). 

The  objective  function  in  (20),  in  term  of  which  P^  is  computed, 
is  y^a  -  ij/(y)  where 

*(y)  -  log  EbeyTg(x‘b)  .  (24) 

If  the  random  vector  g(x,b)  is  nondegeneratc  (i.e.  Vy  f  0,  yTg(x,b) 
is  not  a  degenerate  univariate  random  variable),  then  ip(y )  is  strictly 
convex,  as  follows  from  the  following: 


Lemma  2:  If  2  is  a  nondegenerate  random  vector  in  Rm,  then  the 

function 

T? 

4» (y )  ■  log  Ezey 
is  strictly  convex  in  y. 


Proof: 


Consider  the  function 


strictly  concave  for  t-j  >0,  t 


E(t}tJ'X)  <  E(t,)XE(t2)1_x 


A1 

Put  tj  e  e  ,  t2 


A1 

e  ,  then 


h(trt2)  =  t‘}tl"x  (0  <  x  <  1).  It  is 
2  >  0,  t|  f  tg,  so  by  Jensen  inequality 


(l-x)ylz 

e 


EZe 


or,  taking  log, 

(Xy1+(1-X)y2)  1  yh  yb 

log  be  <  A1°s  Eze  +  109  Eze  1 


which  proves  the  strict  convexity  of  <p(y ) . 

□ 
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Ws  will  derive  still  another  expression  of  P^  in  torn  of  the  con jugate 
function  <!>*  of  0  i . e . 

<i»*(u)  =  sup  (uTy  -  *(y)). 

y 

Proposition  7: 

Pc(x)  =  inf  **(u)  (25) 

u^a 

where  if»*  is  the  conjugate  of  the  strictly  convex  function  tj>,  given 
in  (24).  Moreover,  if  the  expectation  E^e^  ^(x*b)  ^ni-te  f0r 
every  y  then 

^*(u)  =  uVlu)  -  4»(v^-1  (u) )  (26) 


where  v<p  is  the  gradient  vector  of  ij*.  i.e.  the  i-th  component  of 
Viji  is 


[v»|.(y)]i 


Ebgi(x.b)/9|>--t'1 

r  eyTg(x,b) 

Lbe 


Proof:  By  (20) 


PE(x) 


sup  {yTa  -  <p(y )  > 
y^O 


(27) 


The  Lagrangian  dual  of  the  problem  in  the  rhs  of  (27)  is  easily  seen 
to  be 


inf  ij>*(a+v) 
v*0 

and  with  change  of  variables  u  =  a+v  one  obtains  (25).  The  strict 
convexity  of  ^  follows  from  Lemma  2,  and  the  finiteness  assumption 
implies  that  ^  is  also  smooth.  Hence  v#  is  a  strictly  monotone 


mapping  and  <l>*  coincides  with  its  Legendre  Transform,  which  is 
the  rhs  of  (25)  (see  [12],  Chap.  26). 

□ 

Example  2:  Consider  the  Stochastic  RHS  problem  (SP-RHS)  with  b  a 
jointly  Normal  random  vector,  with  mean  vector  p  and  covariance  matrix 
V  (positive  definite  since  b  is  assumed  nondegenerate).  Then  a  direct 
computation  shows  that  here  (20)  becomes  the  quadratic  program: 

PF(x)  =  sup  {yT(g(x)  -  p)  -  i  yTVy}  , 
ysO 

while  (27)  is  the  dual  quadratic  program: 

PF(x)  =  inf  {(g(x)  -  p  -  u)TV_1(g(x)  -p-u)}  . 

L  u>0 


For  a  Stochastic  Program  with  independent  constraint  a  further 
simplification  of  the  expression  for  P£  is  possible.  In  fact,  the 
infimum  in  (25)  can  be  computed,  and  we  get  an  explicit  representation 
of  P^  in  terms  of  the  conjugate  function  of 


4'i(y1)  *  log  Eb_e 


y^g^x.b.) 


We  use  the  following  notations: 


for  a  function  h(t),  h:  R  -*■  R  let 


Dh 


Let  also 


«,(x)  =  Eb19i (x*bi ) * 

Proposition  8:  For  (SP),  with  independent  constraints 

m 

Mx)  s  2  I|*(max(m.(x),a.) 

i»l  1 


(28) 
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where 

**(t)  =  tD-\(t)  -  *i(0"1*.(t)).  (29) 

Moreover,  ^* ( t )  is  a  strictly  increasing  function  for  t  >  m^x). 

Proof:  By  Proposition  3,  P£(x)  ~  zP^(x),  therefore  vie  have  to  show  that 

pJ(x)  =  \''*(max(m.(x;,ai)). 

Now,  from  Proposition  7: 

Pr(x)  =  inf  tf*(uj 
u^a.. 

where  is  exactly  given  by  (29).  The  function  is  strictly 
convex  and  simple  calculus  shows  that 

pI(x)  =  inf  **(u)  =  v.|(niax(D"^*(0)  ,a-))  .  (30) 

u2ai 

But  it  is  a  well  known  fact  of  conjugate  functions  that  D”%*  =  D^,  so 

D"%*(0)  =  Dip(O)  =  E^g.fx,^)  =  m^x).  (31) 

Using  this  in  (30),  the  desired  expression  for  P^  is  obtained.  To 
prove  the  last  statement  of  the  proposition,  note  that  from  (31) 

0  =  Di<*(m.j  (x)) 

and  since  ip*  is  strictly  convex,  this  implies 
DiP*( t)  >0,  for  t  >  mi  (x) , 
which  establishes  the  claimed  monotonicity. 

□ 
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Remark  2:  The  last  statement  of  Proposition  8  and  (28)  provides  a 
proof  for  Proposition  4. 

Consider  the  saddle  function 

k(x,y)  =  gQ(x)  +  p(y'a  -  log  Ebey  9(x,b}^  (32) 

Then,  by  the  dual  expression  (20)  of  P^,  we  see  that  the  Deterministic 
Primal  problem  (DP)  becomes 

(DP)  inf  sup  k(x,y). 
x  y>0 

An  equivalent  program  will  be  generated  if  we  use  another  saddle  function 

-  ^  k(x,y/p) 

£(x,y)  -  -  e  p 


obtained  from  k  by  one-to-one  transformations  of  its  domain  and  range. 
Mow,  a  little  algebra  shows  that 

-  s<S0M  *  yT(g(x,b)-a)J 
£(x,y)-  -Ebe  "  0 

thus,  we  proved: 


Theorem  2:  The  Deterministic  Primal  problem  (DP),  derived  via  the 
entropic  penalty  approach,  is  equivalent  to  the  saddle-function  problem 


(DP-EU)  inf  sup  EU(ih(x,y)) 
x  yaO 

where  U(*)  is  the  constant-risk-aversion  utility  function  U(t)=  -e 
(or  any  positive  affine  transformation  of  it)  and  where  *b(x,y)  is 
the  classical  Lagrangian  corresponding  to  the  original  (SP)  problem. 


i  .e. 


*b(x,y)  s  g0(x)  -  yT(g(x,b)  -  a). 

□ 

-  ■'■■■»  - - ■■ 


t 
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CHAPTER  4  -  THE  DUAL  PROBLEM  OF  (DP)  FOR  STOCHASTIC  RHS  PROGRAMS 
In  this  section  we  treat  exclusively  the  problem 


(SP-RHS)  inf{gQ(*x):  g^(x)  <b-,  i  =  l,...,m}  . 

This  is  a  specialization  of  the  general  (SP)  problem  with 
g(x,b)  -  b-g(x)  and  a  =  0.  The  expression  for  the  entropic  penalty, 
is  given  in  (9).  From  the  results  of  Chap.  3,  dual  representations  of 
are,  by  (20)  and  Proposition  7: 
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Here  bmax  is  tbe  vector  w!l0SS  i-th  component  is  the  right  extreme 
value  of  the  support  of  b...  Therefore,  the  Deterministic  Primal  problem 
is  here  a  relaxation  of  the  problem 

inf  (g  (x):  g(x)  <  u> 
x 

and  it  rules  out  surely  infeasible  solutions,  i.e.  those  x's  for 
which  g(x)  *  bmax 

We  have  already  computed  P^-  for  the  case  of  joint  Normal  random 
variables  (Example  2).  We  add  here  two  more  examples  for  (SP-RHS) 
with  independent  b.'s. 

Example  3:  (Independent  Poisson  variates).  Let  the  b^'s  be 
independent  random  variable  each  having  a  Poisson  distribution  with 
parameter  (mean)  A^  so 

fb.(k)  =  £r  e  M’  k  =  °’1’2 . 

The  function  ^(*)  in  (36)  is  the  log  of  the  moment  generating 
function,  so 

^(y)  =  Ai(ey  -  1). 

The  derivative  is  D«t>i (y)  =  A^,  the  inverse  is  D”1 ( t)  *  log(t/A.j) 
and  thus  by  (36): 

$*{t)  «  t  log(t/xi)  -  t  +  A.  . 

Note  that  is  a  convex  and  strictly  increasing  function  for  t>A^, 
as  anticipated  by  Proposition  8. 
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The  final  expression  for  P^  is  by  (36): 


m  f,  (gjfxj-xj  >  /  (g-(x)-x.) a  (g,(x)-x,)  •,  f 

;(x),  Hh-L  1)  - 


Example  4:  (Independent  Gamma  variates).  Let  each  have  a  Gamma 

distribution  with  parameters  x-  and  r^ ,  i.e.  the  density  is 

A.  r.-l  -x.t 

fb  (t)  «  rTrjT  e  *  1  >  °* 


The  mean  is  p.  =  £(b..)  =  r../x..,  and  the  moment  generating  function  is 
-r. 

(1  -  y/x - )  ,  (y  <  x^).  Therefore  here 


<J>  (y;  =  -r^logd-y/x.)  =  -r.logtl-yp^r^ 


o<j>(y)  = 


u .  r 

1 


,-l 


y  <  ri/pi  ; 


r._w.y'  i  D  4»(t)  -  (ri/ui)(l-pi/t),  t  >  p.  ; 


♦f(t)  =  ri[t/w.-l-log(t/pi)]  , 


t  >  • 


We  obtain  finally  from  (36): 

r(gi(x)-pi)  ,  g.-(x)-p.) .  M 

¥*>  •  -  log  (l  ♦  )}  . 

Note  that  for  the  Gamma  distribution,  the  variance  (a?)  of  b^'s  is 

°i  *  ri^xi  =  pi^ri’  so  ri  =  and  PE  9iven  terms  of  the 

mean  and  variance  by 


„  ,  ,  ^  /(gi<x)-^i>+  ,  /,  (g1(x)-yi)+ 

PE(X,,-71— - 109  0  *  —  u,  )}  ' 


(37) 


t  For  a  real  number  a  we  denote  a+  *  max(a.O) 
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In  terms  of  the  saddle  function  (32),  which  here  becomes; 

k(x,y)  =  90(x)  +  P(yTg(x)  -  log  Ebey  b)  ,  (38) 

The  Primal  Deterministic  Problem  inf{gQ(x)  +  pP^(x)>  is 

(DP-RHS)  inf  sup  k(x,y). 
x  ysO 

We  define  the  Dual  Deterministic  Problem  (DD-RHS)  corresponding  to 
(DP-RHS)  by 

(DD-RHS)  sup  inf  k(x,y). 
y*0  x 

Thus,  the  dual  objective  function  is 

h(y)  =  inf  k(x,y)  (k(x,y)  given  in  (38)) 
x 

and  the  dual  problem  is 

(DD-RHS)  sup  h(y).+ 
yaO 

The  key  issue,  of  course,  regarding  the  dual  pair  (DP-RHS)  and 
(DD-RHS),  is  the  lack  of  duality  gap,  which  here  corresponds  to  the 
existence  of  saddle  value  for  k,  i.e.  the  validity  of 


inf  sup  k(x,y)  =  sup  inf  k(x,y).  (39) 

x  yiO  yaQ  x 

In  this  connection  we  make  use  of  two  conditions  which  guarantee  (39) 
for  a  general  convex-concave  saddle  function  k(x,y). 

Condition  1;  (Stoer  [13]  Corollary  2.13)  "The  inf  sup  k(x,y)  is  attained 

x  y*0 

+  The  problem  may  include  implicitly  more  constraints  on  y  coming 

vTb 

from  the  requirement  Ebe^  <  •. 


Zb  - 


and  k(x,*)  is  strictly  concave". 

Condition  2:  (Rockafellar  [9],  Theorem  8(1),  see  in  particular  the 
Example  on  page  173)  "No  nonzero  yQ  >  0  has  the  property 

yjvyk(x,y)  >0  v(x  e  Rn,  y  >  0)." 

We  now  establish  a  minimax  theorem  for  k(x,y)  in  (38). 

Theorem  3:  Let  (SP-RHS)  be  a  convex  aprogram,  (i.e.  gQ  and  , 
i  =  l,...,m  are  convex  functions),  and  consider  the  saddle  function 
in  (38): 

k(x,y)  =  gQ(x)  +  p(yTg(x)  -  log  Ebey  b)  . 

Then,  either  one  of  the  following  two  conditions 

(i)  inf  sup  k(x,y)  is  attained 

x  yaO 

(ii)  3x  €  Rn  such  that  g(x)  <  bjnax, 

implies  the  exi stance  of  a  saddle  value  for  k,  i.e. 
the  validity  of  (39)  , 

Proof:  The  convexity  of  gQ,  and  all  (i  *  1 ,.. . ,m)  implies  that 

k(»,y)  Is  convex  for  every  y  >  0.  From  Lemma  2  we  know  that 

vTh 

*(y)  *  log  Ebey 

is  strictly  convex,  hence  k(x,«)»  In  (38),  is  strictly  concave.  There¬ 
fore,  condition  (i)  in  the  Theorem  suffices  to  imply  condition  1  of 
Stoer.  Condition  2  of  Rockafellar  reduces  here  to  the  nonexistence  of 
a  nonzero  yQ  >  0  such  that 

yTb 

y°[g(x)  -  ^  >0  v(x  e  Rn,  y  >  o).  (40) 

L  Eey  b 
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Ihis  is  clearly  satisfied  if 


-j 

3x  and  y  >  0  such  that  g(x)  <  -^5^ — 1-  *  7i|»(y) 

Eey  b 


(41) 


To  show  that  condition  (ii)  implies  (41)  it  suffices  to  demonstrate  that 


=  “  Osy£Rm  ^7  *<y)  • 


Let  *i(yi)  =  *(0,0,. ...y.,...  0),  i  =  1,2,. ...m,  i.e. 


*j(7j)  =  leg  Ee 


yibi 


(42) 


Note  that 


sup  *1^)  <  sup  -f-  *(y),  Vi 
Osy^R  Osy€Rm  3yi 

hence  to  prove  (42)  it  suffices  to  prove  that 

/E(b.eyi’bl)  v 

bj  <  sup  *!(y.)  =  sup  ( — 1-f - - ) 

0syi6R  1  Osy^  V  yibi  / 


(43) 


For  this  purpose  consider  a  special  case  of  problem  (E)  in  Chap.  3 
with  a  single  random  variable  bi ,  and  with  a{  =<  b . ,  g^x.t)  =  t 
and  a  single  constraint  (the  i-th),  i.e. 


(EJ  inf  { I ( f , f  ):  [tf(t)dt  >  b.} 
fe01  bi  *  1 

The  dual  program  is  (sec  Chap.  3) 


(H.j ) 


OayfcR  '  105  E8y'b,)  ’  $  <V  '  *1(*» 
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Program  ( E ^ )  is  clearly  feasible  (take  f(t)  =  1  for  t  =  and 
f(t)  =  0  otherwise)  and  hence,  by  Theorem  1,  sup(H.)  <  «.  Now  f^(y) 
is  convex  and  ^(0)  =  0,  hence  by  the  gradient  inequality 

0  =  ♦i(0)>  ^(y)  -  yi|»|(y) 

and  we  get 

«  >  sup(H.)  =  suptb.y  -  ip.(y)}  >  supib.y  -  y<|»](y)} 

1  y>0  1  1  yaO  1  1 

=  sup{y(b,  -  ip! (y))>  . 

y>0  ‘  1 

For  the  latter  to  be  finite  for  y  -►  »  it  is  necessary  that  b.  <limip!(y), 

y-v» 

but  since  >p!  is  strictly  increasing  (a  derivative  of  the  strictly 

convex  function  ip - )  this  is  the  same  as  (43),  and  the  proof  is  completed. 

.  □ 

•  , 

Remark  3:  Condition  (ii),  which  guarantees  the  lack  of  duality  gap 
for  (DP-RHS)  and  (DD-RHS),  is  extremely  mild.  Indeed,  if  it  does  not 
hold,  then  for  almost  all  realizations  of  b,  the  original  (SP)  problem 
is  infeasible.  If  such  ill -posed  stochastic  programs  are  rules  out,  then 
the  ertrcpic  penalty  Deterministic  Primal  always  induces  an  equivalent 
dual  program.  We  shall  see  shortly  what  is  the  meaning  of  this  dual 
program. 

Remark  4:  Condition  (ii)  implies  in  fact  a  stronger  saddle-value 
result  chan  (39),  namely 

inf  sup  !<(;<, y)  -  max  inf  k(x,y). 
x  y^O  yxO  x 

i.e.  the  supremum  of  the  dual  objective  function  h(y)  is  attained. 

(see  [9]).  Condition  (i),  which  assumes  that  for  some  x,y  >  0 


■?:  --  1  % 
—  : - - 
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inf  sup  k(x,y)  =  k(x,y) 
x  ykO 

implies  in  fact  attainment  of  the  dual  saddle  value  at  the  same  point, 
i  .e. 


sup  inf  k(x,y)  =  max  min  k(x,y)  -  k(x,y).  (See  [13].) 
yaO  x  y^O  x 


Remark  5:  Stochastic  Programs  satisfying  condition  (i)  or  (ii)  of 
Theorem  3  will  be  called  well -posed. 

Let  tjj(x,y)  be  the  classical  Lagrangian  corresponding  to  (SP-RHS) 


*b(x,b)  =  gQ(x)  +  yT(g(x)  -  b) 


and  consider  the  constant-risk-aversion  (CRA)  utility  function 
1 


-  r  t 


U(t)  =  -e  p  (or  any  positive  affine  transformation  of  it). 

It  follows  from  Theorem  2,  that  the  primal  problem  (DP-RHS)  is 
equivalent  to 


(DP-EU)  inf  sup  EU(*.  (x,y)). 
x  y*0  D 

Therefore,  the  dual  problem  (DD-RHS)  is  equivalent  to 


(DD-EU)  sup  inf  EU(tK(x,y)). 
yaO  x  0 

To  get  the  full  meaning  of  this  dual  problem  we  first  prove 


Lemma  3: 

inf  CU(t,  (x,y)  =  EU(inf  ?-b(x,y))  . 
x  x 


(44) 


Proof: 


EU(inf  £h(x,y)  =  E  inf  U(tb(x,y)  since  U  is  monotone  increasing 
x  x 

-  E  Ioffe'  P  (9o(x)tyT9(x)-yTb}  . 


inf{  Ee 
x 


(Ei yTl1)  p  (so‘x)^t9‘x> 


)  •  inf  EU(ib(x.y))  . 

x  □ 


Recall  that  for  a  non-stochastic  problem,  the  classical  Lagrangian 
dual  is  the  concave  program 


sup  h(y)  =  inf  *.(x,y) 
y^O  x  D 

From  the  lemma  we  observe  that  in  the  stochastic  case,  the  dual  problem 

(DD-EU)  consists  of  maximizing  the  expected  utility  of  the  Lagrangian  dual  function 

with  the  utility  function  being  of  the  CRA-type.  More  precisely, 

combining  the  results  in  Theorems  2,3  and  the  Lemma  3  we  have  actually 

proven: 

Theorem  4:  Consider  a  well-posed  convex  stochastic  program  (SP-RHS) . 

Let  (OP-RHS)  be  the  corresponding  entropic  penalty  Deterministic  Primal  I 

and  let  (DD-P.HS)  be  the  corresponding  Deterministic  Dual.  Then,  (DD-RHS)  j 

is  equivalent  to  the  concave  program 

(DD-EU)  max  EU(h(y)};  h(y)  -  inf  n.  (x,y) 
y>0  x  D 

where  U  is  the  CRA-utility  Function  wi th  the  Arrow-Pratt  risk  indicator 
being  equal  to  the  reciprocal  of  the  penalty  parameter  P. 

□ 
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CHAPTER  5  -  MEAN-VARIANCE  APPROXIMATIONS 

We  obtain  in  this  section  quadratic  approximations  of  P^(x),  for 
the  general  (SP)  problem 

(SP)  inf{gQ(x):  g(x,b)  >  a>  . 

For  every  fixed  x,  the  random  vector  g(x,b)  is  assumed  non¬ 
degenerate,  with  mean  vector 

m(x)  =  Eg(x,b) 

and  (positive  definite)  variance-covariance  matrix 
V(x)  =  C0V(g(x,b) )  . 

The  variance  vector  (diagonal  of  V(h))is  denoted  by  o2(x). 

Recall  from  Chap.  3  that 

PF(x)  =  sup  {yTa  -  4»(y ) } 

L  y*0 

where 

ip(y )  =  log  Eey  g(x,b)  .  (45) 

Now,  straightforward  calculations  show  that 

*(0)  =  0  (46) 

V*(0)  *  m(x)  (47) 

V2*(0)  =  V(x) .  (48) 

Hence,  a  second-order  Taylor  expansion  of  sp(y )  yield  tiie  following 
approximation  Pg(x)  of  P^x);  in  terms  of  a  concave  quadratic  program. 
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Proposition  9: 

PF(x)  =  sup  {yT[a-m(x)]  -  iyTV(x)y> 

L  y*0 

Another  expression  for  the  approximation  P£(x)  is  given  in  terms  of 
the  following  convex  quadratic  program. 

Proposition  10: 

PF(x)  =  inf  {i-  (u-m(x))TV(x)_1 (u-m(x))}  . 

u>a 

Proof:  By  Proposition  7:  Pf(x)  =  inf  tp* (u)  where  <|i*  is  the  conjug- 

t  u*a 

ate  function  of  <j/  in  (45).  Thus  it  remains  to  show  that  a  second 
order  approximation  tp*  of  is 

i*(u)  =  \  (u-m(x))TV(x)'1(u-m(x)).  (49) 

Since  the  gradient  of  and  its  conjugate  are  inverse  operators, 
i.e.  7<|<*  =  V4>-1 ,  (see  [12],  Chap.  26)  it  follows  from  (47)  that 

vt*(m(x))  =  0  (50) 

and  so,  by  (26)  and  (46),  also 

t*(m(x))  =  0  .  (51) 

Now 

vV  =  v( Vij;*)  =  v^v*)"1]  =  [v2*^*"1)]"1 ,  by  the  Inverse 
Function  Theorem,  in  particular  then,  by  (47),  (48): 

v2i!>*(m(x))  -  V(x)"^  .  (52) 

A  second  order  Taylor  expansion  of  ip*: 

’!»*(u)  =  il»(m(x))+(u-m{x))^v^*(in(x))  +  j-(u-m(x))^v2’ji*(m(x))(u-m(x))  (53) 
indeed  agrees  with  (49)  by  substituting  (5Q)-(52)  in  (53).  □ 


then 
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Remark  6:  If  the  random  vector  g(x,b)  is  jointly  Normal 

Eey  9(x>b)  = exp(yTm(x)  -  yTV(x)y) 

so,  it(y)  is  quadratic,  and  hence  coincides  with  its  Taylor  series 
approximation  ^(y).  The  same  is  true  of  course  for  tp* .  Therefore, 
the  approximations  PE(x)  in  Propositions  9  and  10  are  exact. 

If  the  constraints  g^(x,b)  >  a.,  are  independent  we  can  use 
Proposition  8  and  the  Taylor  expansion  (49)  to  obtain: 

Proposition  11:  For  (SP)  with  independent  constraints,  a  second  order 

A 

approximations  PE(x)  of  PE(x)  is 

p  (x)  =  1  S-L-Cfa,  .  -  (x»  ]2 

E  »?(x)  '  1  * 

where 

mi (x)  =  .gi ( x »bi ) »  <?|(x)  =  variance  of  g^x.b^). 

1  □ 

For  stochastic  RHS  programs  (SP-RHS)  the  above  approximation 
simplifies  as  follows:  let  u  =  Eb,  denote  by  V  the  variance- 
covariance  matrix  of  b,  and  by  a?  the  variance  of  b^.  Then,  by 
Proposition  9, 

PE(x)  =  sup  (yT(u  -  g(x))  -  i  yTVy}  . 

11  y*0  c 

When  b  ~  N(n,V)  the  approximation  is  exact;  compare  with  Example  2. 

The  approximate  entropic  penalty  PE  induces  an  Approximate 
Deterministic  Primal  problem  to  (SP): 

(AOP)  inf  {g  (x)  +  pMx)}  . 
x  0 
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By  Proposition  9, this  problem  can  be  stated  in  terms  of  the  saddle 
function 

k(x,y)  =  gQ(x)  +  p[yT(a-m(x))  -  \  yTV(x)y] 
as 

A 

(AOP)  inf  sup  k(x,y)  . 
x  yiO 

In  the  case  of  independent  constraints,  an  explicit  representation 
of  (ADP),  based  on  Proposition  11  is 

(ADP)  inf  (g  (x)  +  p  2— —  [(a.  -  m. (x) )  ]21 
x  l  0  o? (x)  1  1  r  y 

This,  further  simplifies  fora  Stochastic  RHS  problem  to  (see  Prop.  11): 

inf  jg„(x)  ♦  £  s-j  -  us)t]2}  (54) 

Remark  7:  If  the  variance  of  b-  (a?)  is  largo,  then  as  seen  from 
(54),  the  contribution  of  the  i-th  constraint  to  the  penalty  is 
small.  Therefore,  "ambiguous  constraints"  are  effectively  ignored  in 
the  Approximate  Deterministic  Primal.  The  quantity  l/o?  thus 
serves  as  a  "built-in"  penalty  parameter  for  the  i-th  constraint. 

Remark  3:  The  approximate  penalty  function  P^  dees  not  necessarily 
possess  the  proporty  that  surely  infeasible  solutions  are  ruled  out. 

Therefore,  in  (ADP)  one  should  add  the  constraints  g(x)  >a  (see 

; 

Chap.  2).  For  (SP-RHS)  the  addod  constraint  are  g(x)  <b  . 

max 
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